JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence

نویسندگان

  • Niloy Mukherjee
  • Braja Gopal Patra
  • Dipankar Das
  • Sivaji Bandyopadhyay
چکیده

The complex word identification task refers to the process of identifying difficult words in a sentence from the perspective of readers belonging to a specific target audience. This task has immense importance in the field of lexical simplification. Lexical simplification helps in improving the readability of texts consisting of challenging words. As a participant of the SemEval-2016: Task 11 shared task, we developed two systems using various lexical and semantic features to identify complex words, one using Naı̈ve Bayes and another based on Random Forest Classifiers. The Naı̈ve Bayes classifier based system achieves the maximum G-score of 76.7% after incorporating rule based post-processing techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TALN at SemEval-2016 Task 11: Modelling Complex Words by Contextual, Lexical and Semantic Features

This paper presents the participation of the TALN team in the Complex Word Identification Task of SemEval-2016 (Task 11). The purpose of the task was to determine if a word in a given sentence can be judged as complex or not by a certain target audience. To experiment with word complexity identification approaches, Task organizers provided a training set of 2,237 words judged as complex or not ...

متن کامل

LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles

We present the description of the LTG entry in the SemEval-2016 Complex Word Identification (CWI) task, which aimed to develop systems for identifying complex words in English sentences. Our entry focused on the use of contextual language model features and the application of ensemble classification methods. Both of our systems achieved good performance, ranking in 2nd and 3rd place overall in ...

متن کامل

MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier

This paper describes team MAZA entries for the 2016 SemEval Task 11: Complex Word Identification (CWI). The task is a binary classification task in which systems are trained to predict whether a word in a sentence is considered to be complex or not. We developed our two systems for this task based on classifier stacking using decision stumps and decision trees. Our best system, using contextual...

متن کامل

Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features

SemEval 2016 task 11 involved determining whether words in a sentence were complex or simple for a cohort of people with English as a second language. Training data consisted of 200 annotated sentences, representing the combined judgements of 20 human annotators, such that if any annotator of the group labelled a word as complex, then it was considered to be complex. Testing was based on single...

متن کامل

CoastalCPH at SemEval-2016 Task 11: The importance of designing your Neural Networks right

We present two methods for the automatic detection of complex words in context as perceived by non-native English readers, for the SemEval 2016 Task 11 on Complex Word Identification (Paetzold and Specia, 2016). The submitted systems exploit the same set of features, but are highly disparate in (i) their learning algorithm and (ii) their angle on the learning objective, where especially the lat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016